Podcastle: collaborative training of acoustic models on the basis of wisdom of crowds for podcast transcription

نویسندگان

Jun Ogata

Masataka Goto

چکیده

This paper presents acoustic-model-training techniques for improving automatic transcription of podcasts. A typical approach for acoustic modeling is to create a task-specific corpus including hundreds (or even thousands) of hours of speech data and their accurate transcriptions. This approach, however, is impractical in podcast-transcription task because manual generation of the transcriptions of the large amounts of speech covering all the various types of podcast contents will be too costly and time consuming. To solve this problem, we introduce collaborative training of acoustic models on the basis of wisdom of crowds, i.e., the transcriptions of podcast-speech data are generated by anonymous users on our web service PodCastle. We then describe a podcast-dependent acoustic modeling system by using RSS metadata to deal with the differences of acoustic conditions in podcast speech data. From our experimental results on actual podcast speech data, the effectiveness of the proposed acoustic model training was confirmed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PodCastle: Collaborative Training of Language Models on the Basis of Wisdom of Crowds

This paper presents a language-model training method for improving automatic transcription of online spoken contents. Unlike previously studied LVCSR tasks such as broadcast news and lectures, large-sized task-specific corpora for training language models cannot be prepared and used in recognition because of the diversity of topics, vocabularies, and speaking styles. To overcome difficulties in...

متن کامل

Automatic Transcription for a Web 2.0 Service to Search Podcasts (INTERSPEECH 2007)

This paper describes speech recognition techniques that enable a Web 2.0 service “PodCastle” where users can search and read transcribed texts of podcasts, and correct recognition errors in those texts. Most previous speech recognizers had difficulties transcribing podcasts because podcasts include various kinds of contents recorded in different conditions and cover recent topics that tend to h...

متن کامل

Podcastle: Improvements of Speech Recognition by Using Acoustic Modeling Based on Wisdom of Crowds

1 はじめに我々は,ポッドキャストを音声認識によって自動的にテキスト化することで,それらをユーザが全文検索できるだけではなく,詳細な閲覧, 編集も可能なソーシャルアノテーションシステム「PodCastle1)2)3)」の開発,運営を行っている. ポッドキャストは実環境の多様な音声データであり,従来の音声認識技術では高い認識率を達成することは難しい.そこで PodCastleでは,多数のユーザに認識誤りを訂正 (アノテーション)する協力をしてもらうことで,音声認識率をシステムの運用中に向上させる枠組みを採用している.こうすることで,検索サービスとしての質を向上させるだけでなく,音声認識技術の底上げをはかることも狙っている. 本研究では,上記の枠組みの一環として,PodCastleを通じて得られる集合知,すなわちユーザによる音声認識誤りの訂正結果を活用した音響...

متن کامل

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...

متن کامل

PodCastle: A Spoken Document Retrieval Service Improved by Anonymous User Contributions

In this invited paper, we introduce a public web service, PodCastle, that provides full-text searching of speech data (Japanese podcasts) on the basis of automatic speech recognition technologies. This is an instance of our research approach, Speech Recognition Research 2.0, which is aimed at providing users with a web service based on Web 2.0 so that they can experience state-of-the-art speech...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Podcastle: collaborative training of acoustic models on the basis of wisdom of crowds for podcast transcription

نویسندگان

چکیده

منابع مشابه

PodCastle: Collaborative Training of Language Models on the Basis of Wisdom of Crowds

Automatic Transcription for a Web 2.0 Service to Search Podcasts (INTERSPEECH 2007)

Podcastle: Improvements of Speech Recognition by Using Acoustic Modeling Based on Wisdom of Crowds

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

PodCastle: A Spoken Document Retrieval Service Improved by Anonymous User Contributions

عنوان ژورنال:

اشتراک گذاری